Introduction to R
A language about more than statistics
Dr. Peng Zhao (✉ peng.zhao@xjtlu.edu.cn)
Department of Health and Environmental Sciences
Xi’an Jiaotong-Liverpool University
1 Objectives
- Know what R can do
- Set up the R/RStudio environment
- Understand the way how R works
- Basic operations in R
2 Installation
Online compiler
Main program (Mandatory): R
Integrated Development Environment (Highly recommended): RStudio
R Packages
install.packages(c("beginr", "ggplot2", "GGally", "ggplotgui", "learnr", "mindr", "MSG", "pinyin", "Rcmdr", "plotly", "remotes", "swirl"))
remotes::install_github("pzhaonet/fecitr")3 What is R
R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys, and studies of scholarly literature databases show substantial increases in popularity; as of April 2021, R ranks 16th in the TIOBE index, a measure of popularity of programming languages.
— Wikipedia: R (programming language)
R is far more. It is a way you communicate with your computer.
4 What can R do
Statistics/calculation with R
Graphs
- R graph gallery: https://www.r-graph-gallery.com/
- More:
Automatic tasks
- Batch emails
- Batch downloading data (ebird])
Academic communication
Fun
library(beginr)
plotcolors()
library(pinyin)
py('西交利物浦大学', dic = pydic())
library(mindr)
mm(c('# Pros', '# Cons'), root = 'R language')5 Basic operation
5.1 Hotkeys
- Ctrl + Enter
- TAB
- F1
5.2 Demo data
write.csv(iris, 'dat.csv', row.names = FALSE)5.3 Import data
dat <- read.csv('dat.csv')5.4 Statistics/Calculation
# mean and standard deviation
mean(dat$Sepal.Length)
sd(dat$Sepal.Length)
# more statistics
summary(dat)
# groups
tapply(dat$Sepal.Length, dat$Species, mean)
tapply(dat$Sepal.Length, dat$Species, sd)
# analysis of variance
xx <- aov(dat$Sepal.Length ~ dat$Species)
summary(xx)
# regression
mylm <- lm(dat$Petal.Width ~ dat$Petal.Length)
summary(mylm)5.5 Graphs
plot(x = dat$Petal.Length,
y = dat$Petal.Width)
abline(mylm)5.6 Packages
summary(dat) Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
library(fecitr)
plot_summary(dat, base = "hist", if_box = TRUE)library(ggplot2)
ggplot(dat, aes(Petal.Length,Petal.Width))+
geom_point() +
geom_smooth(method = "lm")
library(GGally)
ggpairs(dat, aes(color = Species, alpha = 0.1))library(plotly)
ggpairs(dat, aes(color = Species, alpha = 0.1)) |>
ggplotly()5.7 Export data
dat$new <- dat$Sepal.Length - mean(dat$Sepal.Length)
write.csv(dat, "dat2.csv")5.8 GUI
library(Rcmdr)
library(ggplotgui)
ggplot_shiny()6 Pros & Cons
| Software | Difficulty | Type | Cost | Usage | Support | Best for |
|---|---|---|---|---|---|---|
| Excel | Easy | GUI | Cheap | Wide | Widespread | Graphs |
| R | Difficult | Code | Free | Increasing | Strongly online | Cutting edge |
| SPSS | Medium | GUI | Expensive | Social Sci. | Manual | Statistics |
| SAS | Difficult | Code | Expensive | Decreasing | Manual | Complex |
7 Move forward
7.1 Partners
7.2 Help documents
demo(graphics)
demo(persp)
demo(image)
demo(plotmath)
demo(nlm)
demo(lm.glm)
demo(smooth)
# ggplot2
example(qplot)
# GGally
example(ggpairs)
# MSG
library(MSG)
demo(basketball)
demo(pointArts)
demo(gradArrows1) # Gradient descent method7.3 R packages
library(swirl)
library(learnr)
run_tutorial("ex-data-basics", "learnr")7.4 Books
- Beginners:
- Advanced users:
- Chinese users: